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Abstract 



Exact expressions for the distribution function of a random variable of 
the form {{aiXmi "*" '^2Xm2)/l™-l)/(x?/^) are given where the chi-square 
(~| ■ distributions are independent with degrees of freedom 7711,7112, and u re- 

spectively. Applications to detecting joint outliers and Hotelling's mis- 
specified T^ distribution are given. 

Key Words: Generalized F distribution, hypergeonietric functions, Cook's 
Di statistic, outliers, misspecified Hotelling T^ distribution. 

Q^ : 1 Introduction 

O 

\^ ' The generalized F distribution is defined as follows. Suppose that the elements 

^^ , of X = [Xmi' ■ ' ' jXm,.] (^ > 1) ^^'2 independent chi-square random variables 

O^ ' with degrees of freedom {mi, ■ ■ ■ ,mr), respectively; let {ai > a2 > ■ ■ • > ctr > 

^^ , 0} be nonincreasing positive weights; and identify T = aiXmi + ' ' ' + Q^rXm,.- If 

,S^ ' 'C(^) = X^(^) independently of X, then the cdf of 



T/\m\ {aixl, +■■■ + «rXmJ/l"-^l (.. 

Vjv V/v ' ^ ' 



\^u " where \m\ — mi + • • • + rrir, is denoted by Friw; ai, • • • , ar;rni, • • • , rrir; v). 

'V^ . If all of the ai {1 <i <r) are equal to say a, then the cAf of W is denoted 

C^ ' by Fr{'W]a]mi,- ■ ■ ,771^;!^), the scaled central F distribution with degrees of 

freedom (|r77|,j/). To avoid the trivial case, we will assume that the positive 
weights are pairwise distinct. 

We will give exact expressions for the pdj of W for r = 2 in terms of the 
hypergeometric series 2-^1- This is the analog for generalized functions of the 
known result for a mixture of two chi-square distributions (Bock and Solomon 
(1988)). For r > 2, we give three numerically tractable expressions for the pdf 
and cdf of W. Apphcations include the detection of joint outhers using Cook's 
Di statistics and the calculation of the power of Hotelling's T^ test with a 
misspecifed scale. 



2 The Distribution of {T / \m\) / {V / v) 

Building on the work of Robbins and Pitman (1949), Gurland (1955), and Kotz, 
Johnson, and Boyd (1967), Ramirez and Jensen (1991) showed how to compute 
the pdf for Wq — T/V asa. weighted series of F distributions; and they computed 
the error bounds for the truncated partial sums. Their results are stated for 
Wo = T/V, with r — p, and with C{V) = x^(j^ — p + 1); and they used the 
notation from Kotz, Johnson and Boyd (1967). We give the results for the 
general case below where it is convenient for our derivation to use the notation 
from Robbins and Pitman (1949). 

2.1 The Probability Distribution Function for W 

Write 

T = ari—xln +■■■ + ^^X™._, + X™J- (2) 

ar ar 

Following Robbins and Pitman (1949, p. 555) define the constants Cj by the 
identity 

r oo 

4=1 J=0 

where 

-mi/2 



i=l 



^=n^ ■ w 



The series in Equation converges absolutely for \z\ < ai/{ai ~ a^). Set z = 
to see that cq — A, and set z = 1 for the equality J27Lo ^j — 1- Then P\T < 
y] = '^CjG\rn\+2j{y/ctr)i whcrc Gk is the cdf for the chi-square distribution 
with k degrees of freedom. As in Ramirez and Jensen (1991, p. 100), wc find 
that the pdf for W — {T/\m\)/{V/i') has the representation as stated in the 
following 

Theorem 1 With the notation above. 



hw{w) -- 




v"-^ c,- TO , f \'m\ w , , „ \ 




■^ — ' ar |to.| + 2,] \ \m\ + 2j ar J 



(5) 



with fpiw; Wi, V2) the density of the central F distribution with degrees of freedom 

(t^l,1'2)- 



A bound for the global truncation error e-r for the t*^ partial sum of the pdf of 

W = {T/\m\)/(y/v) is given by 

ECi |w| , / IttiI w , , ^ \ ,„. 

^ a.(H + 2V+l)) ^^- (-" + ■■■ + --» ^-- (^) 

Proof. Use the equality X^i^o "^^ ~ ^^ ^^"^ note that \fp{w; vi, V2)\ < 1 when 
f 1 > 2 and V2> 1- ■ 

The global bound Cr can be used to determine the number of terms r to use 
in the truncated series expansion of the pdf for W in Equation ^. In Section L2 



we improve on the global error bound e,- by identifying the local error bound 
as a hypergeometric f miction 2^^!- 

2.2 Calculation of the Coefficients Cj 

Kotz, Johnson, and Boyd (1967) gave the following expression for Cj, 

r / \ mi/2 

2—1 ^ -^ 






We are able to reduce the numerical complexity in the computation of 
the coefRcients Cj by determining a recursive algorithm for Cj. Fix parame- 
ters fii, ... , /i^ and variables ui, ... ,Ur with |ui| < 1 for all i (1 < i < r). For 
A: = 0,1,2,..., let 

D Y^ TT \l^i)ni rii t \ 

Pk = 2^ [[ —u^%n^ (ni,... ,n^). 

|n|=fci=l ^'■' 

Note that J2kLo Pk = nLi(l " '"»)"'''• Denote the set i? = {1, 2, . . . , r}. For 
i & R, define 

ScR,\S\=ij&S 

sc-R,|s|=i \jes I jes 



Thus Ci is the elementary symmetric function of degree i in ui, . . . ,Ur. Then 
for k > 1 



^^fe = E(-l)'"'((^ - *)'^^ + Z')^'^- 



i=l 



To prove the identity, let Ai = /ij — 1 for all i; and for a fixed n = (ni, . . . , n^) 



with Inl = k, examine the coefficient of ] f ^^^^"' u"' in the sum /cPfc 

-'--'-4=1 n,! 



71, n,: 



YJl=i{-'^y{{k - i)et + fi)Pk-i- Let C, = ^ — ' = t — - — then this coef- 

fii + Ui-l Ai+Ui 

ficient equals 

1=1 SGB.,\s\=i jes jes 

The coefficient of k in this expression is 01=1 (1 ~ Ci)- For each s, the coefficient 
of A, is 



1=1 [jes 

=-cn(i-c.)- 

But AsC = — ris(l— Cs)i and so these terms sum to — y'^_i n^ 111-1 (1~ 

As + ris 
Cj), and |n| == fc. This completes the proof by noting that c^ — APk with 
Mi = w,/2. 

3 Exact Expressions for the pdf of W 

Use the negative binomial series 

(l-sz)-^=Es"%^z'" (9) 

m=0 

to express Equation as 

oo OO ] — 1 if. 

3=0 3=0 ii + ---+i,^-i=jk=l 

with 

< Ui = 1 - — < 1 (1 < i < r). 
ai 



Note that Ur = 0. Denote 



var 



(11) 



ZZ+I^T^I 



Bo = a-/^ ^,}' ' , (12) 

r(M)r(f) 

U;(l™|-2)/2 

BiH = rrn-iw^, (13) 



and write the pdf for W = (r/|m|)/(V"/i^) with 

t(u;) 



a + w 



as 



/ ,1 I , t\ /I I \ (l™l+2j-2)/2 
/1W(«;) - 2^ — — ,,.__,,„_.^ ^ ,..., X (.+ |m|+2,)/2 



J: 



- ^"^r(^r(f)(i + M^)> 



oo Ci 



/ 1/+ 1 m I \ 

i?oi3i(^)E /, |X " ^H^ (14) 



-^ /|m|\ 



JiH \-ir-i=j k=l 



= ABoBi(u;)F^ I ^ '~2~'"' '~2~'~2~^^^'^"^'"' '*(^)"'~-i 

(16) 

where Fd is a Lauricella function (Srivastava and Karlsson (1985, p. 41) where 
we correct the typographical error with Equation |l6|)). Equation |l6| gives a 
representation of the pdf of the distribution W. We wiU show in Theorem that 
the cdf of W is also a Lauricella F^ ' function. This representation will yield 
a numerically computable algorithm for finding p-valucs. Equation O yields a 
numerically tractable expression for the pd/ of W . In Section W\ we give a 
tight local truncation error bound eriw) for determining the number of terms 
r to use in the partial sum expression. 

3.1 Exact Expressions for the pc// of W with r = 2 

If r = 2, Equation na is a hypergeometric series, and we have the following 
result. 



Theorem 2 With the notation above, a — i'a2/\'m\, and r = 2, the pdf of W 
is given by 

oo f v+mi+m2 \ (mi \ 
i=0 -^-V 2 )j 

l+m: ' ;(i--)^ . (18) 

To find the cdf of W wlien r = 2, integrate hw{w) in Equation nK 
We note tliat if we had used the notation of Kotz, Johnson, and Boyd (1967) 
and scaled y by y/S with < S < Or, then U2 > 0. In this situation, we would use 
the Bailey transformation (Srivastava and Karlsson (1985, p. 304)) to convert 
the two variable hypergeometric series in Equation O to the 2F1 function in 
Equation ^. 

3.2 Exact Expressions for the cdf of W with r > 2 

The Lauricella function Fj^ in Equation |l^ has an integral representation 
(Exton, 1976, p. 49) where the domain of integration is over the simplex Er 
with xi + ■ ■ ■ + Xr = 1 {xi > 0,1 < i < r) as 

(r„i) /i/+|to| mi mr-i \m\ 

Fb { 2 '"2"'"' '^~'~'*(^-'"i'''' ^ti^hr-i 

/ {l-Y,tH^^^^y'^l[xf''d^. (19) 



7n\ 
1 ■ 



'^^ 1=1 1=1 



In Dunkl and Ramirez (1994a, 1994b), we computed the surface measure of 
ellipsoids using hyperelliptic integrals. We showed that the {n — l)-dimensional 
hyperelliptic integral could be transformed into a univariate integral using the 
Euler integral representation (Exton, 1976, p. 49) for Fo- This transformation 
does not apply to Equation 19 since '^^™ > ^. Here we will use a different 
approach. 

We show how to represent the cdf of the generalized F distribution W 
as a Lauricella Fj~, function. This representation will provide a numerically 
tractable procedure for computing the cdf of W, denoted by Hwiw), which 
does not require integrating the pdf of W. 

Theorem 3 With the notation above and r > 2, the cdf of W is given by 

yl™l/2 

Hwiv) = ABa- ^ 



(|m|/2)(a + y)('^+l™l)/2 

-,(r)^'^+|m| mi m^-i \m\ 

'2 '~''"'^~' '~ 



Fd i n ;^r'"- :^^— .l;^r + l;Hy)"ir-- ,t{y)ur-i,t{y)), (20) 



with a — i>ar/\m\ and t{y) ~ y/{o. + y) as before. 



Proof. From Equations nfl and O, write the cdf of W as 

Hw{y) = / hw{w)dw 
Jo 



^ABn 



r(¥) r u;M/2-l 



^ ^ -(i.+ |m|)/2 ^ 

r(MJ 



i^. V i^« + ^ 



(i.+ lm|)/2 

=1 



r 

i=l 

Change variables with Si = wxt/y (1 < i < r) and Sr+i = 1 — w/y. Note 
that X]i=i ^i — "^ Iv with the absolute value of the inverse Jacobian J^^ = 



d{xi ,••• ,a:r_i ,u;) 



Sr.-l,Sr) 



w^ ^ /y^ ■ Thus 

r(M) 



Hw{y)^AB^- 



nLir(^) 

J ErA--\ \ , — 1 / ^ — 1 



i=l / i=l 



(r \ -(i'+l'^D/S r 

a + y t{a + y J l\ 

I yl™l/2 

" "^^"1777,1/2 (a + 2;)(-+IH)/2 ^^^^ 

Pni^^-^ ^. • • • , ^, 1; y + 1; t(2/W, • • • , tiy)ur-i, %)), 

with a = h'ar/l'ml and i(j;) = y/{a + y). ■ 



To convert Equation Ell into a numerically tractable series, write 



Hw(y)^ABo 



1 



,|m|/2 



|m|/2(a + j/)(''+l'"l)/2 






t{v)ui,--- ,t{yhr-i,t{y)) 



= B„ 



,|m|/2 



(a + y)(''+l™l)/2 



oo / iH-|m|) \ 

E '- 2 ^J 

j=0 { 2 /J + 1 ^ "^ 



r-1 



A 



E nft(f,.. 



■ilH h!r-l<j fc=l 



s. 



■»|/2 



oo / ix + |m|) \ 
Y^ I 2 /J 

(a + y)(-+IH)/2 A. (M)^.^^ Va + 2/ 



i J 



E< 



(22) 



4 Local Truncation Error Bounds 

Denote by hwiw) and Hwiy) the partial sum estimates for hw{w) and H'^{y), 
respectively, from Equations |lj and |2^. In this Section, we derive local trun- 
cation error bounds to determine the number of terms required by the partial 
sums. 



4.1 Local Truncation Error Bound e*(y) for the cdf of W 

For Equation [2^ to be numerically tractable, we derive the local truncation 
error. Write t{y) = {y/{a + y)) < 1, 

oo / t^+|m|) N j 



-BoBiiy) 
BoBiiy) 



y 



3 1=0 

3 



T f v + \m\) \ 

oo I }j+\m\) \ 3 

(|m|/2) ^ fM + i).^^^ ^ 



(23) 



oo ^ L/+|m|) 1 1 1 \ r +l+ j 

E \|^l ,_ J^^' %)^a-^+ E -0 (24) 

4=0 



.ti; (M+,+2),. 



The partial sum estimate Hwiij) can be enhanced by identifying most of 
the truncation error as a scaled 2F1 hypergeometric functions. The remain- 
ing truncation error is bounded by a scaled 2-F1 function and is stated in the 
following. 

Theorem 4 With the notation above, the estimated P[W < y] is given by 

/ i/+|m|) N / t^+|™|) 1^12 1 \ 



(25) 



with local truncation error bound given by 



T+l / u+\rn\}_\ 

=;,.)^(l-|:c.,B.B,„j^'^W«. ,26) 



To find T, we increase the size of t unless the remaining error e*{y) from 
Equation g^ is less than a prescribed small value. The suggested value is 10""*. 

4.2 Local Truncation Error Bound er{w) for the pdf of W 

Recall that Equation O yields a numerically tractable expression for the pdf of 
W. A tight local truncation error bound for determining the number of terms 
T to use in the partial sum expression follows as above and is stated in the 
following. 

Theorem 5 With the notation above, 

hw{w) = BoB,{w) Y, '). |x ' tiwy (27) 



with local truncation error bound given by 

/iH_M)\ 
Criw) ^ Cr + lBoBiiw) ^ 







(28) 



a scaled 2F1 hypergeometric function. 



To determine the number of terms for the partial sum estimate hy^riw)^ 
increase the size of t unless the local truncation error from Equation ^ is less 
than a prescribed small value. The suggested value is ylO"^ where the p-value 
is calculated from y. 

5 Applications 

We will give two applications where the distribution of the test statistic is the 
generalized F distribution. 

5.1 Detection of Outliers 

Cook's (1977) Dj statistics are used widely for assessing influence of design 
points in regression diagnostics. These statistics typically contain a leverage 
component and a standardized residual component. Subsets having large Dj 
are said to be influential, reflecting high leverage for these points or that I 
contains some outliers from the data. Consider the linear model 

Yo = Xo/3 + £o, (29) 

where Yq is a {N x 1) vector of observations, Xo is a {N x k) full rank matrix 
of known constants, ^ is a (A; x 1) vector of unknown parameters, and eo is a 
{N X 1) vector of randomly distributed Gaussian errors with E{£q) = and 
Var{£Q) = cr^Ijv. The least squares estimate of /3 is ^ = (XQXo)~^XgYo. The 
basic idea in influence analysis, as introduced by Cook (1977), concerns the 
stability of a linear regression model under small perturbations. For example, if 
some cases are deleted, then what changes occur in estimates for the parameter 
vector /3? Cook's Dj statistics are based on a Mahalanobis distance between /9 
(using all the cases) and /3j (using all cases except those in the subset /), as 
given by 

Dj{^, M, ca') = (Pi - fi)'m0, - 0)/{ca^), (30) 

with a (fc X fc) nonnegative definite matrix M, a is an unbiased estimate of the 
variance, and a user defined constant c. We use c — r and the estimator s^, the 
sample variance estimator with the cases in / omitted We will discuss the case 
with M = X X, where X denotes the remaining rows of Xq. We have chosen 
s\ as the estimator for a^ since this estimator and the numerator of Equation 
^ are independent. 
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" Yi ■ 
. Y2 . 


== 


X 
Z 


/3 + 


£1 
^2 



Using the results in this paper, we are able to numerically compute the 
cdf of Cook's D/ statistics in the case of joint outliers, and, in particular, to 
compute the p- values for I?/. This approach provides a statistical procedure for 
identifying influential observations based on p- values. 

5.1.1 Notation 

To fix the notation, let / be a subset of {1, . . . , N}, say I = {ii, . . . , ir}. Let 
Xq be partitioned as Xq = [X , Z ] , with X containing the rows determined 
by /, and Z the remaining rows. We assume that the matrices Xq, X, and Z 
all of full rank, of orders {N x k), {n x k), and (r x k), respectively such that 
k < n < N, and 7i + r = N, with r < k for notational convenience. Partition 
Yq = [YJ^, Y2], and £q = [e^, £2]. Thus Equation ^ has been transformed into 

(31) 

The ordered eigenvalues of Z(XgXo)~^Z are denoted {Ai > • • • > Ar > 0} 
usually called the canonical leverages. Jensen and Ramirez (1991) showed that 
the cdf for Wq = T/V, equivalently for W = {T/r)/{V/v), is a weighted series 
of F distributions, and they computed the stochastic bounds 

Fr{w ;ai;iy) < F,.(w ; ai, . . . , a,.; 1, . . . , 1; i^) < Fr{w ; a*; v) , (32) 

with the maximum weight ai, the geometric mean a* of the weights {ai, . . . , 
ar-}, and Fr{w ; a; v) the scaled central F distribution. 

The basic characterization theorem for Dj is given in Jensen and Ramirez 
(1998a) and is: 

Theorem 6 Suppose that C(Y) — A^jv(Xo/3, (t^Iw), then the distribution of 
Di0,X. li-jTsj) is given by Fr{w; Ai, • • • , A^; 1, • • • , 1;7V — r — fc). 

With r = 1, £{Di0,X'X,s^)/Xi) = F{1,N - 1 - fc). Outliers also can 
be tested using the studentized deleted residuals with C{{yi ~ y(j))/(sj(l + 
Xj(X X)^^Xj)^'^)) — t{N — 1 ~ k) where y(j) denotes the predicted value 
using (Yi,X); or with the externally studentized residuals (RStudent) with 
^{{yi ^ yi)/{siy/l — ha)) = t{N — 1 — fc) where iji denotes the predicted value 
using (Y, Xq) and ha is the canonical leverage also denoted as Ai. In Jensen 
and Ramirez (1998b) it is shown that the p- values from these two tests are also 
equal to the p- values from Theorem o. Thus, in case of single deletion with 
r = 1, all of these three standard tests for outliers will have a common p- value. 

5.1.2 Examples 

For the Hald (1952, p. 647) data set {N = 13 and k — 5) using the test statistic 
Di0,X. X, 2sj) and the global bounds in Equation |3^, we can show that the 
only pair (r = 2) of observations (from the 78 possible pairs) which could 
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possibly be influential at the 5% significance level is / = {6, 8} with 0.01305 < 
Pi < 0.04610. Using mi — m2 = 1, the canonical leverages A = (0.408676, 
0.124019) for the weights a, the degrees of freedom v = N — r — k~Q, and 
the observed Cook's D/ statistic y — 2.19331, we can now easily compute from 
Equation O that the p- value is pi= 0.02181. 

For the Longley (1967) data set. Cook (1977) noted that observations 5 
and 16 may be influential. To test for the joint influence of / = {5,16}, we 
use the test statistic D/(^,X X, 2s^), with r = 2, the canonical leverages A = 
(0.690029, 0.614130) for the weights, i^ = iV - r - fc = 16 - 2 - 7 = 7, and the 
observed Cook's Di statistic y = 1.812433, we compute that the p-value is pi = 
0.12927. 

Using the test statistic Di0, X X, 2sj) and the global bounds Equation ^, 
it is easy to compute that the only possible pairs that need to be considered 
at the 5% significance level are (1) h = {4,5} with A = (0.615959,0.371827), 
y = 2.57861, and 0.03822 < pi, = 0.04186 < 0.06356, (2) h = {4, 15} with A = 
(0.505387.0.393672), y = 1.76885, and 0.04961 < pi^ = 0.04982 < 0.05555, and 
(3) /g = {10,16} with A = (0.736874,0.695572), y = 2.57906, and 0.03761 < 
p/3 = 0.04571 < 0.07979 where the p-values pi are computed from Equation ^. 

Our recommendation to the practitioner, who wishes to find joint out- 
liers, is to initially screen for potential joint outliers using Equation p2 with 
Dj{0,'K 'X.,rsj). If r = 1 then the distribution of Di is a scaled central F 
distribution. If r = 2 then the distribution of Dj is a scaled 2F1 series. If 
r > 2 then use Equationg^ to find the numbers of terms required to have the 
local truncation error small. The suggested value for the bound is 10"''. The 
p- values for the cdf for the distribution of Di{0, X X, rsj) are calculated using 
the enhanced truncated series in Equation p5l 

5.2 Misspecified Hotelling's T test 

Hotelling's T^ is used widely in multivariate data analysis, encompassing tests 
for means, the construction of confidence ellipsoids, the analysis of repeated 
measurements, and statistical process control. To support a knowledgeable use 
of T^, its properties must be understood when model assumptions fail. Jensen 
and Ramirez (1991) have studied the misspecification of location and scale in 
the model for a multivariate experiment under practical circumstances to be 
described. 

To set the notation, let A^p(/2, S) be the Gaussian distribution with mean 
/x, and dispersion S and let Wp{i'* , S) denote the central Wishart distribution 
having i/* degrees of freedom and scale parameter S. Consider the representation 
T2 = z/*Y'W-iY where (Y,W) are independent and C{Y) = Np{fi,i:) as 
before, but now £(W) = Wp{i^* , $7). Denote the ordered roots of n~2'Sft~2 by 
{tti > 7r2 > • • • > TTp > 0}. A principal result for T^ under misspecified scale is 
given in Jensen and Ramirez (1991) and is the following. 

Theorem 7 The distribution of the test statistic ((i/* —p+ \)/p){T'^ /v*) is the 
generalized F distribution Fr{w; tti, • • • , tt^; 1, ■ • • , 1; i>* — p + 1) . 
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5.2.1 Hotelling's misspecifed scale distribution 

The conventional model for T^ is based on a random sample {Xi , . . . , Xj^} from 
Np^fijT,) using the unbiased sample means and dispersion matrix (X, S). We 
have /:(X) =Np(/x,il]) and /:((A^-1)S) =Wp(iV-l, E), or £(i^S) =WpiN- 

1, is). Thus T2 = (7V-1)(X - ;x)'(^S)-^(X - /x) = 7V,(X ~ t^yS-^X - /x) 
and C{{{N-p)/p){T'^/{N- 1))) = F{p,N-p), the central i^ distribution when 
N > p. If the process dispersion parameters have shifted, then T^ is mis- 
specified with /:((7V - 1)S) =^Wp{N-l,n), and with {{N ~ p)/p){T^/{N - 1)) 
the generalized F distribution Fr{w; tti, • • • , TTp; 1, • • • , 1; iV — p). Here r = p, 
u = v*— p-\-l = N — p, and {tti > 7r2 > ■ • • > TTp > 0} the ordered roots of 

5.2.2 Examples 

An important application of generalized F distributions is for computing the 
power of a misspccificd Hotelling's T^ test for a multivariate quality control 
chart. Power analysis for a misspecified mean fj. is standard. Using generalized 
F distributions, the power analysis for a misspecified covariance 17 can be per- 
formed. If a process changes, not only will the mean change but generally the 
covariance structure will also change. The robustness of T^ under misspecifi- 
cation of scale can be verified by computing the cumulative density of T^ for 
varying choices of tti > 7r2 > • • • > tTj, > at the critical value of T^ . For exam- 
ple, if rjp is a 3 X 3 equicorrelated matrix (r = p = 3) with p = 0.5, and if S is the 

identity matrix, then the eigenvalues of flp ^Ttflp ^ are {tti = (1 — p)^^,7r2 = 
(l-/9)-i,7r3 = (l-t-2p)-i} = {2,2,1/2}. IfiV = 12 with z/ = iV-p = 9, the nom- 
inal 95% critical value of {{N -p)/p){T^/{N -I)) is F{0.95;p,N -p) = 3.8625. 
However, the exact right-hand tail probability for Y ~ {{N — p) / p){T'^ / [N — 1)) 
is not 0.05 but rather P[Y = {{N - p)/p){T'^ /{N -1)) > 3.8625] = 0.12310. In 
this example, tti = 7r2, so we could compute the p- values exactly from Theorem 
1, with F3(w;7ri,7r2,7r3; 1, 1,1; N - p) = F2(w; tti, 773; 2, 1; N - p). Instead, we 
use this problem to demonstrate the number of terms required by the three 
numerical methods discussed in this paper. 

In Table 1, we present similar computations for varying p. For each p in 
the Table 1, and with the corresponding eigenvalues tti > 7r2 > tts > of 

n~^i:n~^, we give the value of P[r = {{N ~ p) / p){T^ / {N - 1)) > 3.8625. Also 
shown are the number of terms required using the three numerical presented in 
this paper. The first is ti from Equation [7| required to satisfy yCr^ < 10~^, the 
second is t^ from Equation |28| required to satisfy ycT^iy) < 10~*, and the third 
is T3 from Equation p6| required to satisfy e*^{y) < 10~^. The inputs are r = 3, 
the weights tti > tt2 > tts > 0, i^ = N — p = 12 — 3 = 9, and y = 3.8625. 



13 



Table 1. 


Misspecified Type I Error 


P 


Tl 


T2 


^3 


P[Y > 3.8625] 


0.0 


1 


1 


1 


0.0500 


0.1 


6 


7 


6 


0.0526 


0.2 


10 


11 


8 


0.0600 


0.3 


15 


15 


12 


0.0727 


0.4 


20 


20 


16 


0.0926 


0.5 


28 


26 


21 


0.1231 


0.6 


40 


32 


27 


0.1704 


0.7 


58 


40 


34 


0.2458 


0.8 


92 


49 


43 


0.3712 


0.9 


185 


58 


55 


0.59055 



As anticipated, the numbers of terms t required is fewer when the enhanced 
partial sum from Equation Ea is used. More importantly, the method from 



Section 4.1 does not require that the pdf to be numerically integrated. 



6 Conclusion 

We have derived the exact distribution of the generalized F distribution 
F2{w; ai, 02'-, mi, 7712; 2) in terms of the hypergeometric series 2F1. This extends 
the corresponding result of Bock and Solomon for a mixture of two chi-square 
distributions to the generalized F distribution with r = 2. Explicit represen- 
tations for the case r > 2 are given in terms of a Lauricella Fo functions. 
Numerically computable series expansion have been derived. Applications to 
the detection of joint outliers and to the misspecified Hotelling T^ statistic have 
been given. 
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